Skip to content

feat(kubescape): route runtime-detection alerts to Headlamp, Slack, and Coroot#2445

Merged
devantler merged 2 commits into
mainfrom
claude/kubescape-runtime-alerts
Jul 5, 2026
Merged

feat(kubescape): route runtime-detection alerts to Headlamp, Slack, and Coroot#2445
devantler merged 2 commits into
mainfrom
claude/kubescape-runtime-alerts

Conversation

@devantler

Copy link
Copy Markdown
Contributor

🤖 Generated by the Daily AI Assistant

Why

The Headlamp Kubescape plugin's Runtime Detection → Alerts page shows "Alertmanager URL is not configured", and Kubescape's runtime-detection alerts (rule violations, malware) were flowing nowhere. That tab reads only from a Prometheus Alertmanager, which the Coroot migration removed from the cluster — so there was no source to point it at.

What

Reintroduces a single, tiny, Kubescape-scoped Alertmanager (prod-only; not a return of the old Prometheus stack) and points the node-agent at it, so runtime alerts now reach all three intended places: the Headlamp plugin, Slack (the existing shared webhook), and Coroot (via the node-agent's stdout, which Coroot's log capture surfaces).

Operational notes

  • One manual step: the Headlamp plugin's Alertmanager address is a per-browser setting that can't be seeded declaratively (headlamp#3979). Set it once per operator to kubescape/alertmanager:9093. Until then the data source exists but the tab stays empty. Documented in docs/dr/alerting.md.
  • New dependency: the prometheus-community Alertmanager Helm chart.
  • Needs a direct merge after promotion (trusted-author PR).

…nd Coroot

The Headlamp Kubescape plugin's "Runtime Detection > Alerts" tab warned
"Alertmanager URL is not configured" because that tab reads ONLY from a
Prometheus Alertmanager (GET /api/v2/alerts), and the Coroot migration removed
Alertmanager from the cluster — so there was no source and the node-agent
exported its runtime alerts nowhere.

Reintroduce a single minimal Alertmanager (prometheus-community chart 1.40.1,
~10m/32Mi, emptyDir, hardened securityContext) scoped to the kubescape namespace,
prod-only — NOT a re-adoption of the Prometheus stack. Wire the node-agent to fan
each alert out to all three destinations:

  * Headlamp — nodeAgent.config.alertManagerExporterUrls -> the Alertmanager,
    which the plugin queries. (One manual per-user step remains: set
    "kubescape/alertmanager:9093" in the plugin settings; the address is
    browser-local, not declaratively seedable — headlamp#3979.)
  * Slack — the Alertmanager slack_configs receiver -> the shared
    ${alertmanager_webhook_url} incoming-webhook (same channel as Coroot/Flux).
  * Coroot — nodeAgent.config.stdoutExporter (default) -> Coroot's eBPF log
    capture surfaces the alert in its Logs view (Coroot CE has no alert receiver).

Adds a CiliumNetworkPolicy allowing the Headlamp API-server Service-proxy to
reach :9093 and the Alertmanager to reach hooks.slack.com; documents the design
and the manual Headlamp step in docs/dr/alerting.md.

Validated: ksail --config ksail.prod.yaml workload validate (485 files),
kustomize build of the hetzner controllers overlay, and the naming CI check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jul 4, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a Kubescape-scoped Alertmanager deployment, network policy, node-agent exporter wiring, and documentation for runtime-detection alert fan-out to Slack, Coroot, and Headlamp.

Changes

Kubescape Alertmanager deployment and wiring

Layer / File(s) Summary
Helm repository and Alertmanager release configuration
k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-repository.yaml, k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yaml, k8s/providers/hetzner/infrastructure/controllers/alertmanager/secret.yaml
Adds the Helm repository, Alertmanager release, and Slack webhook Secret used by the release.
CiliumNetworkPolicy for Alertmanager traffic
k8s/providers/hetzner/infrastructure/controllers/alertmanager/cilium-network-policy.yaml
Adds ingress on port 9093 and egress to Slack and DNS for Alertmanager pods.
Alertmanager kustomization resource list
k8s/providers/hetzner/infrastructure/controllers/alertmanager/kustomization.yaml
Defines the Kustomize manifest that includes the Alertmanager repository, release, secret, and network policy resources.
Kubescape node-agent exporter patch and controller wiring
k8s/providers/hetzner/infrastructure/controllers/kubescape/patches/helm-release-patch.yaml, k8s/providers/hetzner/infrastructure/controllers/kustomization.yaml
Adds the Kubescape HelmRelease patch for runtime-detection exporters and wires the alertmanager directory and patch into the controllers kustomization.
Alerting documentation update
docs/dr/alerting.md
Documents the Kubescape runtime-detection alert integration, the three alert destinations, and the manual Headlamp Alertmanager URL configuration.

Estimated code review effort: 3 (Moderate) | ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly and concisely summarizes the main change: routing Kubescape runtime-detection alerts to Headlamp, Slack, and Coroot.
Description check ✅ Passed The description is directly related to the changeset and accurately explains the new Alertmanager flow and operational notes.

Comment @coderabbitai help to get the list of available commands.

@devantler

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jul 4, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
k8s/providers/hetzner/infrastructure/controllers/alertmanager/secret.yaml (1)

17-18: 🧹 Nitpick | 🔵 Trivial

Verify prod substitution and consider surfacing delivery failures.

Syntax and key-to-path coupling verified correct. One operational note: if alertmanager_webhook_url is ever missing/renamed in the prod variables Secret, this silently falls back to the .invalid placeholder rather than failing reconciliation, so Slack delivery would quietly break. Alertmanager exposes alertmanager_notifications_failed_total; consider ensuring it's scraped/alerted on (e.g., via Coroot) so a bad substitution doesn't go unnoticed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@k8s/providers/hetzner/infrastructure/controllers/alertmanager/secret.yaml`
around lines 17 - 18, The Alertmanager secret’s
`${alertmanager_webhook_url:=...}` fallback can hide a missing or renamed prod
variable by silently using the placeholder URL, so check the `slack-webhook-url`
substitution path in the secret generation flow and make it fail or surface an
obvious configuration error when the variable is absent. Also ensure
`alertmanager_notifications_failed_total` is being scraped and alerted on (for
example through Coroot) so broken Slack delivery is detected quickly.
k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yaml (1)

37-86: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick win

Consider disabling the default-mounted service account token.

The pod's securityContext is hardened extensively (drop ALL, readOnlyRootFilesystem, non-root, seccomp), but automountServiceAccountToken is left at the chart's default (true), even though this Alertmanager instance has no need to call the Kubernetes API.

🔒 Suggested addition
     fullnameOverride: alertmanager
     replicaCount: 1
+    automountServiceAccountToken: false
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yaml`
around lines 37 - 86, Disable the default-mounted service account token in the
Alertmanager Helm values by setting automountServiceAccountToken to false
alongside the existing podSecurityContext and securityContext hardening in the
alertmanager Helm release values. This Alertmanager instance does not need
Kubernetes API access, so add the setting in the same values block that defines
fullnameOverride, persistence, and extraSecretMounts to keep the pod
least-privileged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yaml`:
- Around line 37-86: Disable the default-mounted service account token in the
Alertmanager Helm values by setting automountServiceAccountToken to false
alongside the existing podSecurityContext and securityContext hardening in the
alertmanager Helm release values. This Alertmanager instance does not need
Kubernetes API access, so add the setting in the same values block that defines
fullnameOverride, persistence, and extraSecretMounts to keep the pod
least-privileged.

In `@k8s/providers/hetzner/infrastructure/controllers/alertmanager/secret.yaml`:
- Around line 17-18: The Alertmanager secret’s
`${alertmanager_webhook_url:=...}` fallback can hide a missing or renamed prod
variable by silently using the placeholder URL, so check the `slack-webhook-url`
substitution path in the secret generation flow and make it fail or surface an
obvious configuration error when the variable is absent. Also ensure
`alertmanager_notifications_failed_total` is being scraped and alerted on (for
example through Coroot) so broken Slack delivery is detected quickly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5261491e-8f47-45cc-badd-4a6d7130a7f0

📥 Commits

Reviewing files that changed from the base of the PR and between 32ce888 and 4efee1b.

📒 Files selected for processing (8)
  • docs/dr/alerting.md
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/cilium-network-policy.yaml
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yaml
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-repository.yaml
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/kustomization.yaml
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/secret.yaml
  • k8s/providers/hetzner/infrastructure/controllers/kubescape/patches/helm-release-patch.yaml
  • k8s/providers/hetzner/infrastructure/controllers/kustomization.yaml
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{yaml,yml}: Use Kustomize overlays rather than editing base resources directly; k8s/bases/ is immutable from overlays and changes should be made with patches: in provider or cluster overlays.
Keep manifest changes small and use YAML/schema validation before submitting a manifest PR; for files with cluster context, prefer ksail workload validate / kubectl kustomize / kubectl apply --dry-run=client as appropriate.

Files:

  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/kustomization.yaml
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-repository.yaml
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/secret.yaml
  • k8s/providers/hetzner/infrastructure/controllers/kubescape/patches/helm-release-patch.yaml
  • k8s/providers/hetzner/infrastructure/controllers/kustomization.yaml
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/cilium-network-policy.yaml
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yaml
k8s/**

📄 CodeRabbit inference engine (AGENTS.md)

k8s/**: Respect Flux dependency order: bootstrapinfrastructure-controllersinfrastructureapps, with the prod-only infrastructure-overprovisioning layer hanging off infrastructure without gating apps.
Follow the hierarchical Kustomization flow: base configurations in k8s/bases/ feed provider overlays in k8s/providers/, which feed cluster overlays in k8s/clusters/.

Files:

  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/kustomization.yaml
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-repository.yaml
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/secret.yaml
  • k8s/providers/hetzner/infrastructure/controllers/kubescape/patches/helm-release-patch.yaml
  • k8s/providers/hetzner/infrastructure/controllers/kustomization.yaml
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/cilium-network-policy.yaml
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yaml
🧠 Learnings (1)
📚 Learning: 2026-07-01T21:13:36.950Z
Learnt from: devantler
Repo: devantler-tech/platform PR: 2359
File: k8s/bases/apps/actual-budget/helm-release.yaml:62-111
Timestamp: 2026-07-01T21:13:36.950Z
Learning: When reviewing Kustomize/Helm YAML in this repo, keep the base vs provider overlay split: `k8s/bases/apps/**` and `k8s/bases/infrastructure/**` should contain each app’s full, environment-agnostic configuration (including base-level postRenderer Kustomize patches such as deployment strategy, topology spread, probes, and env injection). `k8s/providers/{docker,hetzner}/**` should only add small provider-specific deltas (e.g., `interval`, `persistence.size`) via patch files (like `k8s/providers/<provider>/apps/<app>/patches/helm-release-patch.yaml`). If configuration is identical across providers (e.g., OIDC/OAuth env vars where `${domain}` is resolved per cluster via envsubst), it belongs in the base and must not be duplicated into provider overlays.

Applied to files:

  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/kustomization.yaml
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-repository.yaml
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/secret.yaml
  • k8s/providers/hetzner/infrastructure/controllers/kubescape/patches/helm-release-patch.yaml
  • k8s/providers/hetzner/infrastructure/controllers/kustomization.yaml
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/cilium-network-policy.yaml
  • k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yaml
🪛 markdownlint-cli2 (0.22.1)
docs/dr/alerting.md

[warning] 122-122: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (8)
docs/dr/alerting.md (2)

122-124: 📐 Maintainability & Code Quality | ⚡ Quick win

Add a language tag to the example fence.

Line 122 trips MD040. Mark the block as text (or console) so docs lint cleanly.

Proposed fix
-```
+```text
 kubescape/alertmanager:9093
</details>

<!-- cr-comment:v1:47221755df8546f9c0a84d36 -->

_Source: Linters/SAST tools_

---

`126-128`: _🎯 Functional Correctness_ | _⚡ Quick win_

**Verify the proxy-RBAC note.**

This section says the plugin reads via the API-server service proxy, but the `get/create` permission claim is specific enough that it should be confirmed against the actual RBAC rule before publishing. If the binding only grants `get` on `services/proxy`, this will mislead operators.

<!-- cr-comment:v1:bc8cc05c86b2145ae12415b6 -->

</blockquote></details>
<details>
<summary>k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-repository.yaml (1)</summary><blockquote>

`1-10`: LGTM!

<!-- cr-comment:v1:a01ce1910a40def551e3a146 -->

</blockquote></details>
<details>
<summary>k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yaml (1)</summary><blockquote>

`1-118`: LGTM! Chart version, `extraSecretMounts` field, and emptyDir-on-disabled-persistence behavior all verified against the upstream `prometheus-community/alertmanager` chart.

<!-- cr-comment:v1:07cf03233f46f8811cb08f08 -->

</blockquote></details>
<details>
<summary>k8s/providers/hetzner/infrastructure/controllers/alertmanager/kustomization.yaml (1)</summary><blockquote>

`1-9`: LGTM!

<!-- cr-comment:v1:dc581c3ba6db26d02d702759 -->

</blockquote></details>
<details>
<summary>k8s/providers/hetzner/infrastructure/controllers/alertmanager/cilium-network-policy.yaml (1)</summary><blockquote>

`18-31`: _🩺 Stability & Availability_

**Cross-file dependency is already covered**

`allow-kubescape` already allows intra-namespace traffic and DNS egress for every `kubescape` pod, so Alertmanager doesn’t need additional rules for the node-agent path or `hooks.slack.com` resolution. 

				> Likely an incorrect or invalid review comment.

<!-- cr-comment:v1:73c2a31054187e2d1e4a57d2 -->

</blockquote></details>
<details>
<summary>k8s/providers/hetzner/infrastructure/controllers/kubescape/patches/helm-release-patch.yaml (1)</summary><blockquote>

`1-34`: LGTM!

<!-- cr-comment:v1:3c7fb0dd0be87c1256b50abb -->

</blockquote></details>
<details>
<summary>k8s/providers/hetzner/infrastructure/controllers/kustomization.yaml (1)</summary><blockquote>

`11-17`: LGTM!



Also applies to: 80-83

<!-- cr-comment:v1:b14629628be1e049bbe8b938 -->

</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

…anager

It never calls the Kubernetes API; chart default is true.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@devantler

Copy link
Copy Markdown
Contributor Author

🤖 Generated by the Daily AI Assistant

Review-body nitpick resolution record (no threads exist for these):

  • automountServiceAccountToken — valid, fixed in dced919 (false at chart top level; this Alertmanager never calls the Kubernetes API).
  • .invalid fallback should fail / surface — refuted as-designed: the ${alertmanager_webhook_url:=…invalid} default is deliberate. Flux postBuild substitution failing on a missing var would block the ENTIRE hetzner infrastructure Kustomization (every controller), which is a far worse failure mode than one alert route going quiet; the placeholder keeps the blast radius at 'Slack delivery off'. The alertmanager_notifications_failed_total scrape/alert idea is a fair follow-up for the Coroot integration but out of scope for this PR.

@devantler

Copy link
Copy Markdown
Contributor Author

🤖 Generated by the Daily AI Assistant

Resolution record for the CodeRabbit review-body nitpicks (2026-07-04 12:27Z review — no inline threads exist for these):

  1. Silent ${alertmanager_webhook_url:=…invalid} fallback — keeping as-is, by design. The inline default is the repo-wide Flux-substitution convention that keeps ksail workload validate (no SOPS access) and the local/CI overlays building; a strict/hard-fail substitution would wedge the entire hetzner infrastructure Kustomization on one missing variable (the exact blast-radius class of the 2026-07-02 prod wedge). Misconfiguration is not silent in practice: the variable is the SAME one Coroot incidents and the Flux notification-controller already post to (a rename breaks visibly in three systems), the RFC-2606 .invalid placeholder makes a bad substitution obvious in the rendered config, and Alertmanager logs failed notifications — which Coroot's log capture ingests via the same channel as the node-agent stdout alerts.
  2. automountServiceAccountToken: false — already in the HelmRelease values (helm-release.yaml, "never calls the Kubernetes API" block); the finding is stale against the current head.

@devantler devantler marked this pull request as ready for review July 5, 2026 07:00
@devantler devantler added this pull request to the merge queue Jul 5, 2026
Merged via the queue into main with commit beaafb4 Jul 5, 2026
15 checks passed
@devantler devantler deleted the claude/kubescape-runtime-alerts branch July 5, 2026 07:39
@github-project-automation github-project-automation Bot moved this from 🫴 Ready to ✅ Done in 🌊 Project Board Jul 5, 2026
@botantler-1

botantler-1 Bot commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 1.102.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

@botantler-1 botantler-1 Bot added the released label Jul 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

1 participant